Text Clustering Quality Improvement using a hybrid Social spider optimization
نویسنده
چکیده
Text document clustering is one of the most widely studied data mining problems. It organizes text documents into groups such that each group has similar text documents. While grouping text documents, several issues have been observed. Accuracy and Efficiency are the main issues in text document clustering. Recently, as clustering problem can be mapped to optimization problem, evolutionary optimization techniques have been used by researchers to improve accuracy and efficiency. Evolutionary techniques are stochastic general purpose methods for solving optimization problems. Swarm Intelligence is one such technique that deals with aggregative behavior of swarms and their complex interactions without any supervision. In this paper, we proposed a novel swarm intelligence algorithm called Social Spider Optimization SSO for textual document clustering. We compared it with K-means clustering and other state-of-art clustering algorithms such as PSO, ACO and Improved Bee Colony Optimization and found it to give better accuracy. Then we proposed two hybrid clustering techniques namely SSO + K-means and K-means + SSO and found SSO + K-means clustering outperformed Kmeans + SSO, KPSO(K-means + PSO) , KGA (K-means + GA), KABC(K-means + Artificial Bee colony) and Interleaved K-means + IBCO clustering techniques. We used Sum of intra cluster distances, average cosine similarity , accuracy and Inter cluster distance to measure the performance of clustering techniques.
منابع مشابه
A Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملGROUND MOTION CLUSTERING BY A HYBRID K-MEANS AND COLLIDING BODIES OPTIMIZATION
Stochastic nature of earthquake has raised a challenge for engineers to choose which record for their analyses. Clustering is offered as a solution for such a data mining problem to automatically distinguish between ground motion records based on similarities in the corresponding seismic attributes. The present work formulates an optimization problem to seek for the best clustering measures. In...
متن کاملA Comparative Analysis of Particle Swarm Optimization and K-means Algorithm For Text Clustering Using Nepali Wordnet
The volume of digitized text documents on the web have been increasing rapidly. As there is huge collection of data on the web there is a need for grouping(clustering) the documents into clusters for speedy information retrieval. Clustering of documents is collection of documents into groups such that the documents within each group are similar to each other and not to documents of other groups...
متن کاملHybrid semantic clustering of hashtags
Clustering hashtags based on their semantics is an important problem with many applications. The uncontrolled usage of hashtags in social media, however, makes the quality of semantics and the frequency of usage vary a lot, and this poses a challenge to the current approaches which capitalize on either the lexical semantics of a hashtag (by using metadata) or the contextual semantics of a hasht...
متن کاملA Hybrid Data Clustering Algorithm Using Modified Krill Herd Algorithm and K-MEANS
Data clustering is the process of partitioning a set of data objects into meaning clusters or groups. Due to the vast usage of clustering algorithms in many fields, a lot of research is still going on to find the best and efficient clustering algorithm. K-means is simple and easy to implement, but it suffers from initialization of cluster center and hence trapped in local optimum. In this paper...
متن کامل